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Identifying communities (or clusters), namely groups of nodes with 
comparatively strong internal connectivity, is a fundamental task for 
deeply understanding the structure and function of a network. Yet, 
there is a lack of formal criteria for defining communities and for 
testing their significance. We propose a sharp definition which is 
based on a significance threshold. By means of a lumped Markov 
chain model of a random walker, a quality measure called "persis- 
tence probability" is associated to a cluster. Then the cluster is 
defined as an "a-community" if such a probability is not smaller 
than a. Consistently, a partition composed of ^-communities is an 
"a-partition" . These definitions turn out to be very effective for 
finding and testing communities. If a set of candidate partitions is 
available, setting the desired a-level allows one to immediately se- 
lect the n partition with the finest decomposition. Simultaneously, 
the persistence probabilities quantify the significance of each sin- 
gle community. Given its ability in individually assessing the quality 
of each cluster, this approach can also disclose single well-defined 
communities even in networks which overall do not possess a definite 
clusterized structure. 
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Complex networks are currently one of the most exten- 
sively studied subjects in the field of applied mathemat- 
ics. In the last fifteen years, a huge number of theoretical 
results have been put forward, and almost any field of science 
and technology has benefit from the application of such results 
to specific problems [TJ [5] 131 E] • 

One of the most promising but challenging tasks in net- 
work science is community analysis, which is aimed at reveal- 
ing possible partitions of a network into subsets of nodes (com- 
munities, or clusters) with dense intra- but sparse inter-group 
connections. Finding and analyzing such partitions often pro- 
vides invaluable help in deeply understanding the structure 
and function of a network, as widely demonstrated by several 
case studies in social sciences [S] [5] , biology [7] , economics [S] , 
or information science [9|, just to name a few. 

Despite the abundance of contributions on this subject 
(see [10] for a survey), the issue of community analysis cannot 
be considered satisfactorily solved. First of all, finding com- 
munities is a computationally hard task, because the "best" 
partition must be sought for in a set whose cardinality grows 
faster than exponentially with the number of nodes. The ex- 
haustive enumeration of the partitions is thus impossible, and 
heuristic techniques must be employed. Secondly, and perhaps 
more important, there is no widespread consensus on formal 
criteria for defining communities and for testing their signif- 
icance |10J . When a subnetwork can actually be considered 
to form a community, namely a group of nodes with com- 
paratively strong internal connectivity? Probably the most 
important attempt to answer this question was put forward 
by Newman and coworkers [111 [5] [T5], who defined a qual- 
ity index called modularity which quantifies, for a given par- 
tition of the network into candidate communities, to what 
extent the distribution of the intra-/inter-community edges 
is anomalous with respect to a suitably defined random net- 
work. Since high modularity values are obtained in presence 
of groups of nodes with comparatively large intra-community 
edge density, maximizing modularity should put in evidence 



the "best" partition. This method has been proven successful 
in many circumstances but, on the other hand, it has been 
widely demonstrated that, due to intrinsic limitations, it does 
not necessarily always yield a significant partition [131 1141 110] . 
And even when it does, it quantifies the quality of a partition 
but not of each individual community. 

This paper introduces a sharp definition of community 
which is based on a threshold of significance. More precisely, 
once a level < a < 1 is specified, a node cluster is defined to 
be an a-community if the probability that a random walker, 
which is currently in one of the cluster's nodes, remains in the 
cluster in the next step is not smaller than a. Such a proba- 
bility is obtained from an approximate lumped Markov chain 
model of the random walker (i.e., a reduced-order Markov 
chain in which the communities of the original network be- 
come nodes) which is easily derived from the original (high- 
order) Markov chain model. Consistently, a partition com- 
posed of a-communities is defined to be an a-partition. 

If equipped with an effective method for generating a set 
of "good" candidate partitions, the notions of a-community 
and a-partition provide a framework for simultaneously find- 
ing communities and testing their significance. For that, the 
desired significance level a is first fixed. Then, a family of par- 
titions is derived and each partition is immediately checked to 
assess whether it is formed by a-communities. This allows one 
to identify the a-partitions, and to select one of them. Typ- 
ically, one searches for communities which are at the same 
time small (to effectively decompose the network) and signif- 
icant (with much more internal than external connectivity). 
For that, a guideline is that of selecting, among the available 
a-partitions, the one with the largest number of communities. 

But the notion of a-community can also be useful in a 
partially different way. It may happen that, for a given sig- 
nificance level a, no a-partitions are found. Yet, one or a few 
a-communities could exist. They correspond to strongly con- 
nected groups of nodes, even in a network which, overall, does 
not possess a definite clusterized structure. Or, finally, one 
can assess the significance of the results of a single-partition 
method, such as modularity optimization 5 , and obtain an 
immediate assessment of the a-significance of each single com- 
munity and, consequently, of the entire partition. 

In the paper, we first introduce the lumped Markov chain 
model of the random walker and define the notions of a- 
community and a-partition. Testing the a-significance of a 
given community or partition turns out to be extremely par- 
simonious in computational terms. Then we analyze a few 
examples of application and, for that, we propose an effective 
algorithm for deriving a meaningful set of partitions. The al- 
gorithm, which applies hierarchical cluster analysis, is again 
based on the Markov chain model of a random walker and, 
consequently, it involves a notion of similarity/distance among 
nodes which is consistent with the significance criterion above 
introduced. We finally compare this approach, which can be 
applied to fully general networks (i.e., directed and weighted), 
with other community analysis methods having a similar phi- 
losophy. 
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Networks, ^-Communities, and Q-Partitions 

Consider a network with nodes N = {1, 2, . . . , TV} and L edges. 
We consider the most general case of directed and weighted 
network, and we denote by W = [wij] the N x N weight ma- 
trix, where Wij > is the weight of the edge i —Y j. The 
connectivity matrix A = [ay] is the N x N binary matrix 
where a»j = 1 if Wij > 0, and a,ij = otherwise. If the net- 
work is actually undirected we have W = W and A = A' , 
and if it is unweighted we let W = A (i.e., all weights equal to 
1). We assume that the network is strongly connected (e.g., 
[3]), namely there exists an oriented path from any i to any 
j. If the network is directed, for each node i we define the 
(total) degree as ki = k\ n + k° ut = E ■ a j» + E ■ ay, whereas 
ki — Ylj a ji — 12j ai i f° r undirected network. The average 
degree is given by (k) = ^Ztki/N. Similarly, for a directed 
network the in-, out-, and total strength of node i are given 
by sf = E,%, = Y,jWii, and Si = sf + C\ re- 

spectively, and the total network strength by s = w ij ■ If 
the network is undirected we have instead Si = s\ n = s° ut = 

J2j Wj« = Ej w ij and s = Yjij Wij/2. 

A A^state Markov chain n t +i = n t P, with n t = 
(7ri,i7T2.t . . . 7rjv.t), can be associated to the iV-node network 
by row-normalizing the weight matrix W, namely by letting 
the transition probability from i to j equal to 



The quantity pij is the probability that a random walker which 
is in node i jumps to node j, and Tr^t is the probability of be- 
ing in node i at time t. The transition matrix P — [pij] is a 
row-stochastic (or Markov) matrix (0 < pij < 1 for all and 
Y^iiPH = 1 f° r au *)• Furthermore, P is irreducible since the 
network is connected. This implies that the equation n — ttP 
has a unique solution n, which is strictly positive (-Ki > 
for all i) [15] and corresponds to the stationary Markov chain 
state probability distribution. For undirected networks one 
can easily check that ir — (sisa . . . sjv) /(2s), whereas for di- 
rected networks a general closed form does not exist and 7r 
has to be numerically computed. 

We denote by P q a partition of N in q subsets (or subnet- 
works), namely ¥ q — {Ci, Ca, • • • , C g } with U c ^-- C ~ ^ and 
C c PI Cd = for all c, d. In rough terms, the sub-network 
C c is called a community (or cluster) if it has a high internal 
density of weight, i.e., if the total weight of the edges inter- 
nal to C c is much larger than that of the edges connecting 
C c to the rest of the network. The community analysis of a 
given network consists therefore in finding the "best" parti- 
tion P, according to some criterion. Despite a huge amount 
of contributions, there is however no widespread consensus on 
formal criteria for defining communities and for testing their 
significance IfJ. As a consequence, in many situations a more 
fruitful approach is that of searching for a few, "good" par- 
titions P',P", . . ., among which selecting with common sense 
and experience. 

Defining a partition P 9 induces a (/-state meta-network, 
where communities become meta-nodes. The rigorous descrip- 
tion of the dynamics of the random walker at this scale by a 
lumped Markov chain, however, is not possible if not in special 
cases [16] - actually, the Markovian property is not even pre- 
served in general. Despite this limitation, a g-state Markov 
chain can be defined, which correctly describes the random 
walker at the aggregate level provided the stochastic process is 
started at the stationary distribution 7r [171118] . This lumped 
Markov chain is defined by the q x q row-stochastic matrix 

U = [diag {ttH)]- 1 #'diag(7r)P#, [2] 



where H [collecting matrix) is a iV x q binary matrix cod- 
ing the partition P 9 , i.e., its entry hi C is 1 if and only if node 
i G C c . The lumped Markov chain TLt+i = HtU shares the sta- 
tionary distribution with the original one (suitably collected), 
namely II = tvH satisfies II = HU. On the contrary, start- 
ing from an arbitrary no, the lumped Markov chain II = HU 
started at IIo = ttqH provides, in general, only an approxi- 
mate description of the evolution of nH. The difference be- 
tween the real and approximate II, however, tends exponen- 
tially to zero if the two chains are regular [15] . since they 
converge, by definition, to the same stationary state. 

The ability of the lumped Markov chain to describe the 
random walk dynamics only at stationarity is not a limitation 
for our purposes, as it will be demonstrated by the examples 
of application. Note that the entry Ucd of U is the probabil- 
ity that the random walker is at time (t + 1) in any of the 
nodes of community d, provided it is at time t in any of the 
nodes of community c. The diagonal term u cc is defined per- 
sistence probability of community c. Large values of 

Ucc are 

expected for significant communities. In fact, the expected 
escape time from C c is t c = (1 — u cc )~ 1 '- the walker will 
spend long time within the same community if the weights 
of the internal edges are comparatively large with respect to 
those pointing outside. Given a value < a < 1, C e is de- 
fined a- community if u cc > a. Thus a acts as a selection 
parameter, as sharply qualifies communities with respect to a 
given threshold of significance. Consistently, P 9 is defined a- 
partition if it is composed of a-communities, namely u cc > a 
for all c = 1, 2, . . . , q. 

Consider the simple 12-node network of Fig. [T] [19] , which 
is purposely composed of three clusters. Four partitions are 
considered, corresponding to finer and finer divisions, and the 
computed for each candidate community. As long as 
the latter coincide with the "natural" communities, or with 
the union of two of them, all the rather large. But 

as soon as a natural community is broken, some very low per- 
sistence probabilities are found. This result can be used in a 
twofold manner, as extensively shown in the next section. On 
one hand, if a set of finer and finer partitions is analyzed, the 
sudden drop of the signal that a significant community 

has been broken: the network decomposition must be stopped 
before this event. On the other hand, if a single partition is 
given and its significance has to be assessed, the u cc -s imme- 
diately quantify the quality of the partition but also of each 
individual community. 



Applications and Examples 

The proposed method is now applied to assess the significance 
of sets of partitions related to a variety of networks. An algo- 
rithm for deriving partitions is first introduced, implementing 
hierarchical cluster analysis after a random-walk-based node 
distance is defined. Then the results related to three networks 
are discussed: a synthetical benchmark network with built-in 
cluster structure; a real-world network with a rather strong 
community structure; and another real- world network with 
weak clustering but with a few well-defined communities. 

Deriving Partitions. Cluster analysis can be used to group 
"similar nodes" into candidate communities. This needs 
defining a meaningful similarity /distance among each pair of 
nodes. Such a definition is by no means obvious: among the 
many proposals [10] . a few exploit random walks to induce 
a suitable similarity measure (e.g., [20] [21] [22] [23] [24]). We 
follow this line by proposing an approach in which, however, 
we do not explicitly perform random walks in a Monte Carlo 



fashion, but derive analytically the global behavior of a large 
number M of walkers (a "fleet") started from each node i. 

Consider a large number M of repetitions of a random 
walk started from i. For each repetition, the probability that 
the walker is in j after t steps is [P t ]ij- Thus, if M random 
walks of length T are performed from i, the expected number 
of visits to j in any time instant in 1 < t < T is M Y^t=i [^hi • 
By averaging with respect to M, we propose a (symmetric) 
similarity aij defined by 

^ = ^=E([^%+[^)- [3] 

i = l 

Note that this is conceptually equivalent to an explicit ran- 
dom walk approach, but with an arbitrarily large number M of 
repetitions from each starting node instead of one only. Most 
notably, the results do not depend on the actual stochastic re- 
alization of the random walks. We finally define the distance 
dij = dji between nodes by complementing the similarity 
and normalizing the results between and 1: 

,/,., ,/... I °' U,h, °- . [4] 

max Oij — mm aij 

The rationale underlying the definition of s and d is to assign 
nodes a large similarity if a numerous fleet of random 

walkers started in i (resp. j) makes a large number of visits 
to j (resp. i) within a sufficiently small time horizon T. The 
notion of community induced by this metric, therefore, is that 
of a subnetwork where a random walker has a large probability 
of circulating for quite a long time, before eventually leaving 
to reach another group. This is conceptually consistent with 
the definition of a-community above introduced. The choice 
of the time horizon T is potentially critical: if too large, the 
probability of visiting a given state j becomes independent of 
the starting state since it tends to Hj , whereas if T is too small 
the information gathered is insufficient. We will return later 
to this point. 

LFR benchmark.Lancichinetti, Fortunato, and Radicchi 
(LFR) [25] proposed a family of synthetically generated 
graphs, explicitly designed to serve as benchmarks for testing 
community detection algorithms. They explicitly take into 
account two properties found in real networks, namely the 
heterogeneity in the distributions of node degrees and com- 
munity sizes. Both of the latter are taken as power laws, 
with prescribed exponents 7 and /3, respectively. In addition, 
the network is defined by prescribing the number TV of nodes, 
the average degree (k), and a mixing parameter /j, such that 
each node shares a fraction 1 — /i of its edges with the other 
nodes of its own community, and a fraction fi with the rest 
of the network. The benchmark generating method was later 
extended to oriented and weighted networks [26] - here we con- 
sider an example of an undirected, unweighted network with 
N = 1000, (k) = 20, fj, = 0.25, 7 = 2, and /3 = 1. The network 
we obtained turns out to be formed by 38 communities, with 
dimensions ranging from 10 to 49 nodes each. 

Cluster analysis yields a different dendrogram for each 
time horizon T, whose choice is thus nontrivial. At the two 
extremes, setting T = 1 restricts the pairs of nodes which 
are candidate to nonzero similarity to neighboring pairs only, 
whereas larger and larger values of T tend to make any node 
equally similar to any other. We found that an effective se- 
lection of T can be empirically obtained by maximizing the 
cophenetic correlation coefficient C, which is defined as the 
linear correlation between the distances dij and the cophenetic 
distances dj [27] . The latter are a product of the hierarchi- 



cal cluster analysis: for any node pair the cophenetic 

distance dj is the height of the link joining (directly or indi- 
rectly) nodes (i, j) in the dendrogram. The value of C is gen- 
erally used to assess whether the adopted distance dij induces 
an effective clusterization (notice that C qualifies the entire 
dendrogram, and not a network partition), although limita- 
tions have been observed in specific applications 28 . Figure 
[2] shows the dependence of C on T: we take T = 12, which 
attains the maximum C = 0.905. The related dendrogram is 
in the same figure. 

Horizontal top-down cross-sections of the dendrogram 
identify a sequence P2, P3, . . . of partitions with increasing 
number of candidate communities. For each P 9 we compute 
U according to 1(2). and plot its diagonal terms in the persis- 
tence probabilities ' diagram of Fig. [3] The diagram reveals a 
sharp discontinuity. For q < 38, all the rather large 

(u cc > 0.735 for all c). This means that significant communi- 
ties are identified: in rigorous terms, all the proposed parti- 
tions P 9 with 2 < q < 38 are a-partitions with a — 0.735. For 
q > 39 significant communities are broken, as revealed by the 
sudden drop of a larger and larger number of u cc -s. Remind 
that 38 is exactly the number of communities planted in the 
synthetically generated network. It is worth mentioning that, 
if we search for the max-modularity partition (we used the so- 
called "Louvain algorithm" [29], proved to be one of the most 
reliable |30|). we obtain a partition with q — 34 communities, 
with modularity Q = 0.714. The number of communities of 
the planted partition is thus not perfectly recovered (small 
communities tend to be aggregated). Nonetheless, for the ob- 
tained P34 partition the persistence probabilities are in the 
range 0.737 < u cc < 0.772, which is qualitatively consistent 
with the results of Fig. [3] 

We have finally compared the built-in partition planted in 
the LFR benchmark network with the partition P38 obtained 
with our method, as well as with the "max-modularity" par- 
tition. The comparison is in terms of the normalized mutual 
information I, a reliable and often used measure of partition 
similarity, introduced by [31] to the network research com- 
munity. Here we only point out that I — 1 when the two 
partitions are identical, whereas I has zero expected value for 
independent partitions. We obtained / = 0.992 for the parti- 
tion resulting from our method (actually, we checked that as 
few as 0.08% of the pairs i, j are misclassified), and a slightly 
smaller / = 0.987 for the "max-modularity" partition. 

IMetscience network. The Netscience network is a weighted, 
undirected, social network describing the collaborations (up to 
year 2006) among researchers in network science, the weight 
of the edge connecting two researchers being proportional to 
the number of papers they have co-authored [12]. Its giant 
component has N = 379 nodes, and it is generally considered 
an example of a real network with a rather strong community 
structure. Many methods for network analysis, included com- 
munity detection algorithms, have been tested and discussed 
on this example (e.g., [321 1331 l34] V 

At T = 6 we get the dendrogram with largest C, and 
the resulting persistence probabilities' diagram is in Fig. [4] 
The plot has a less clear structure than that of the LFR net- 
work (Fig. [3) : the proper q must be selected with a trade-off 
between a finer decomposition (large g) and a higher signif- 
icance of the communities (small q). For example, all the 
partitions with q up to 10 are a-partitions with a > 0.9. But, 
if less stringent significance levels are required, partition with 
q < 27, or even q < 35, seem to be perfectly meaningful. 

It is instructive to compare these results with those ob- 
tained, on the same case study, by the graph stability ap- 
proach proposed by Delvenne et al. [32] (a detailed compari- 
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son of the two methods is in the next section). By means of 
the KVV algorithm [33] (a hierarchical, divisive, non-binary, 
graph clustering method), they obtain a sequence of six par- 
titions, with q — 2,3, 5, 15, 17, 21. Analyzing and comparing 
the stability curve (i.e., the autocovariance function of a sig- 
nal emitted by a random walker) of each of them, the authors 
suggest their partition with q = 5 as the more reliable, as 
it has the largest stability over a longer time span with re- 
spect to any other. We created the persistence probabilities' 
diagram of the six partitions of [32], and compared it with 
our diagram in Fig. [5] The partition q — 5 of [32| confirms 
to be definitely more significant than those with finer decom- 
position (i.e., q = 15,17,21) according to our criterion too. 
Actually, our and their P5 partitions share the same mini- 
mal u cc = 0.952, due to a common 22-node community. The 
two partitions are, however, partially different (the normalized 
mutual information is I = 0.886, with about 6% of differently 
classified node pairs). 

The inspection of Fig. [5] also reveals that, for each given 
q, the partitions obtained with our method are superior than 
those proposed in [32], provided the criterion put forward in 
this paper is adopted. In fact, they are a-partitions with an 
a value which is larger (or at least not smaller) in all six 
cases. Actually, while the criterion of [32] ranks partitions 
by "averaging" among the communities, our approach is a 
"worst-case" one: by selecting an a-partition one guarantees 
that the "worst" community has a persistence probability not 
less than a. Finally, note that in the gap from q = 6 to 15, 
where no partition is obtained by the KVV divisive algorithm, 
our partition generating algorithm provide a set of finer and 
finer partitions, whose quality only slowly deteriorates as q 
increases. The analyst of the network can fruitfully select in 
this interval a proper trade-off between fine granularity and 
significance of the partition. 



Neural network. The third example concerns a directed, 
weighted network, representing the neural connections of the 
worm Caenorhabditis elegans. Starting from Watts and Stro- 
gatz's seminal work [36], different versions of this graph have 
become a standard benchmark for network analysis. We con- 
sider the directed, weighted version (whose largest connected 
component has N = 239 nodes) , which does not display a def- 
inite community structure. In fact, the maximum modularity 
(estimated as in [29] 1 is rather small, namely Q — 0.486, if 
compared to other examples of comparable dimension (e.g., 
Q = 0.831 for the Netscience network). The less clusterized 
structure emerges even visually from the dendrogram of Fig. 
[S] where only few groups of nodes appear well separated from 
the rest (compare, e.g., with Fig. [2}. 

We show that our method is able to detect such groups, 
namely to isolate well-defined communities even in a network 
which overall does not possess a definite clusterized struc- 
ture. Consider the persistence probabilities' diagram of Fig. 
[7] With the exception of the trivial cases q = 2 and 3, no 
a-partition exists with a reasonably large. Nonetheless, a few 
a-communities with a > 0.8 appear and are stably detected 
in a rather wide range of q. More precisely, the same set of 
five communities with u cc > 0.826 are revealed in the range 
14 < q < 20. They are clusters, of dimension ranging from 18 
to 29 nodes, with comparatively rather strong internal con- 
nectivity. Any other candidate cluster, instead, turns out to 
have a much smaller u cc value and, therefore, it cannot be 
considered to be a significant community. 



Discussion and Conclusions 

In this paper, we have shown that associating a lumped 
Markov chain to a given network partition (i.e., a set of com- 
munities) provides an effective tool for testing the significance 
of each single community and, consequently, of the entire par- 
tition. As a matter of fact, the diagonal terms (called per- 
sistence probabilities) of the lumped Markov matrix can be 
used as quality measures for each individual community. If a 
threshold level < a < 1 is fixed, a sharp criterion for defin- 
ing a community as "significant" is therefore that of requiring 
that its persistence probability is not less than a. 

If an effective method for generating a set of "good" par- 
titions is available, the above criterion can be used to rapidly 
select one of them among those complying with the prescribed 
a-significance, typically the one with the finest network de- 
composition (i.e., the largest number of communities). We 
have used a generator of partitions based on hierarchical clus- 
ter analysis, where the node distance is again defined on the 
basis of a Markov chain random walk model. Overall, the 
method has fair computational requirements, and can be ap- 
plied to fully general networks (i.e., directed and weighted). 
Its effectiveness has been demonstrated on several medium- 
scale examples. 

The proposed approach has important connections with 
two recently published community analysis methods. Del- 
venne et al. [32] show that the autocorrelation function 
of a signal emitted by a random walker, with value c as 
long as the walker is in a node i £ C c , can be ex- 
pressed in terms of the clustered autocovariance matrix Tit = 
H' [diag (tv) P % — 7v'tt~\ H, and they define the stability of the 

partition H as rf — min s =o,i,...,t trace (R s )- Given a set 
of candidate partitions, the graph stability function r t = 
maxir rf puts in evidence, for each time instant t, which is 
the "optimal" partition. It is suggested in [32] that the most 
relevant partitions are those which are optimal over long time 
windows. It is straightforward to check that our matrix U 
is related to the step-1 autocovariance Ri by Ri + II'II = 
diag(II)(7. The two methods are thus based on the same 
ground, but our approach has two advantages: first, for each 
partition H we do not compute a long time-dependent se- 
quence ili, R2, ■ ■ ■ , i?t max (with i max of the same order as N) 
of q x q matrices, but the sole matrix U, with an important 
reduction in the computational burden. Second, the full list 
of the persistence probabilities u cc allows one to test the sig- 
nificance of each single community, whereas the stability of 
the clustering rf averages among all the communities. 

Another work with important connections is that of 
Weinan et al. [37], who suggest to explicitly find the "best" (in 
a suitable sense) q-state approximated lumped Markov chain. 
This boils out to the formulation of a minimization problem, 
after a metric on the space of stochastic matrices is intro- 
duced. A drawback of this method is however that q must be 
a priori specified, whereas often identifying the correct num- 
ber of communities is the main goal of the analysis. For the 
same reason, it can hardly support the discussion of the sig- 
nificance and convenience of choosing one partition instead of 
another. We argue that this method could be used, jointly 
with the one proposed in this paper, to generate a set of par- 
titions with increasing values of q — 2,3,..., by repeatedly 
solving the above problem. Then, their significance could be 
tested with the tool of the persistence probabilities' diagram. 
It is not guaranteed, however, that the proposed partitions 
are "good" in terms of the minimal u cc (i.e., that they are 
a-partitions with large a). It is therefore a point deserving 
further study. 
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Fig. 1. Four different partitions (with increasing number q of communities) of the same network. The persistence probabilities u cc remain rather large as long as the network 
is partitioned into "natural" communities. Passing from (b) to (c), and from (c) to (d), significant communities are broken, with a sudden drop of the relevant persistence 
probabilities. 




Fig. 2. LFR benchmark network. Above: The cophenetic correlation coefficient C as a 
function of T. The maximum is attained at T = 12. Below: The dendrogram obtained with 
T = 12 (only half of the plot is presented for readability). 
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Fig. 3. The persistence probabilities' diagram of the LFR benchmark network. For a partition 
with q clusters, crosses denote the values of the q diagonal terms u cc of the matrix U . Vertical 
straight lines are only for visual aid. 
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Fig. 4. The persistence probabilities' diagram of the Netscience network. 
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Fig. 5. Comparison of two persistence probabilities' diagrams for the Netscience network 
(the two plots are in the same scale). Above: blow-up of the diagram of Fig. [4](our results). 
Below: the diagram related to the six partitions proposed in |32] , 
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Fig. 6. Neural network. The dendrogram obtained with T = 3. 
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Fig. 7. The persistence probabilities' diagram of the neural network. 



